Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contempt 24 #1806

Closed
Closed

Conversation

Vizvezdenec
Copy link
Contributor

I haven't seen any news on this so decided to make a PR...
passed non-regression STC vs c=0
http://tests.stockfishchess.org/tests/view/5bd6d7f80ebc595e0ae21e14
passed non-regression LTC vs c=0
http://tests.stockfishchess.org/tests/view/5bd6e0980ebc595e0ae21f07
Usually it's enough to set it to new (higher) value. Also because we resently decreased PawnValueMg it's logical that higher contempt values don't regress now because they are dependant on this value.

bench 3515380
@Atahan-Turkoglu
Copy link

Master 2018-11-01 vs Stockfish 9

Contempt 21
ELO: 51.68 +-1.9 (95%) LOS: 100.0%
Total: 40000 W: 9487 L: 3581 D: 26932
http://tests.stockfishchess.org/tests/view/5bdb1a140ebc595e0ae2620a

Contempt 24
ELO: 52.21 +-2.0 (95%) LOS: 100.0%
Total: 40000 W: 9759 L: 3793 D: 26448
http://tests.stockfishchess.org/tests/view/5bdb1b680ebc595e0ae2620d

@Rocky640
Copy link

Rocky640 commented Nov 9, 2018

Your comment refers to PawnValueMg, but the dependency is with PawnValueEg isn't it ?
int ct = int(Options["Contempt"]) * PawnValueEg / 100; // From centipawns

@Vizvezdenec
Copy link
Contributor Author

@Rocky640 pawnvalueg was also decreased, although by not that much. But yes, I was wrong but point still stands :)

@NKONSTANTAKIS
Copy link

NKONSTANTAKIS commented Nov 10, 2018

Generally I am all in for contempt increases, but I have many reasons to be neutral about this:

  1. Very underwhelming result vs SF9.
  2. Increased risk of opening misplay.
  3. Diminishing returns on increased resolution vs increased variance for testing. (Lowering drawrate is good but up to a point, we seem to experience too many contradicting results)
  4. Induce long-term overtuning of a spot further away from the self-play optimal. I'm not sure if we need to become even better at beating lower engines than becoming better at objective chess.

Having said all that, I have to say that I am NOT against this change. I regard however that the next Leela networks will be really strong and its a good idea to prepare already. Hence what I considered sometime ago a waste of resources, I think now its a valuable knowledge: to approximately locate the self-play optimal contempt for future use which was half a year ago between 10 and 11.
So probably just a c10 v c12 v c14 head to head is all that we need.

Of course for these tests there is no hurry at all and a very idle network should be in place. Prio -1 is a nice alibi for not wasting resources, but in reality people are interested in those tests and are not as motivated to write patches as with a pure empty Q.

@snicolet
Copy link
Member

I am in favor of this change.

@DragonMist
Copy link

DragonMist commented Nov 12, 2018

My understanding on how contempt in SF brings result is I see any contempt different from zero as resulting in, strictly speaking, suboptimal play. But because of the drawish nature of chess, and also because by playing to keep the game more complex, if not optimal, we create more chances for any given opponent to make a mistake, which SF is good in punishing, because of its tactical ability (and which is proportional to the complexity of the position).
I am therefore quite certain there is a point at which increasing contempt might kick in more and more side effects and be counterproductive. By counterproductive, I refer to, from observing its current play, weird moves which can be significantly below top 3 choice otherwise.
A bit more light into this might shed a simple test: STC with various contempt (C=6, C=12, C=18, C=24, C=30) against current master C=0. Would these 5 STC tests be too expensive to satisfy my curiosity?

@Vizvezdenec
Copy link
Contributor Author

Sure @DragonMist this can happen. Contempt 27 didn't pass [-3;1], for example, and c=50 failed [-4;0] pretty fast.
The thing is that we have a proof that contempt = 24 doesn't really regress in self-play (by >0.5 elo) and that it's better than 0 or 21 vs other engines (all measurements we did prove this, also this is proven by rating lists).
Sure, it causes sf to play "suboptimally" in some positions but also it causes sf to not take easy draws so it can outplay even itself sometimes.
The thing is that when we decided that non-0 default contempt is a good thing we basically formed the rule of how we choose this value - the highest value that passes [-3;1] double SPRT. And I honestly don't see any reason why this shouldn't be the like this in current case.
Sure, we can make 5 contempt values vs 0 and some will pass, some will fail (even elo neutral patch has like 30% chance of failing [-3;1]) - what decision can we make based on this data is what I'm failing to undderstand. Let's say c=6, 12, 24 passes and 18 and 30 fails - what will we do with this?

@SFfan876
Copy link

IMO problem is with dynamic contempt, and i believe, patch that will remove it, will easy pass -3 1

@NKONSTANTAKIS
Copy link

NKONSTANTAKIS commented Nov 13, 2018

@Vizvezdenec @DragonMist Wants some rough estimation of how different scores behave vs ct0. Its not some info to help SF change something, its just interesting info. Sprt's are useless for this, we can do cheap runs of fixed 10K-20K STC games with huge margins to see the elo fluctuation range, is it 2, 4 or 6 elo? Locating the maximum self-play elo contempt is a valuable info for equal-rated opponents.

I remind that half a year ago ct=10 passed clear green (0,4) LTC vs ct=12, but we prefered to keep 12
http://tests.stockfishchess.org/tests/view/5a968d710ebc590297cc8ceb
#1450
This is very interesting thread @DragonMist It featured extensive testing showing that ct=8 is weaker than ct=10. Nowadays I am 99% certain that the maximum self play ct is between 9 and 14. All this range should beat ct=0 by 2-4 elo.

I think its very useful to have all the previous info and discussions together:
The original dynamic contempt: #1394
The per thread dyn cont: #1515
The fix for multi-PV: #1491
The logarithmic pull (with lots of test data): #1450
The S-curve v Log v Arctan pull: #1439
The simplification of Arctan by Ceebo: #1558
The reason we used (-3,1) for Ceebo: #1553
The 12 to 21 change (with extensive testing): #1646
The first! contempt 20 implemetation: #1366
The initial! contempt 7 extensive discussion: #1361

@MichaelB7 I kinda agree, those stuff are vain for progress but at the same time interesting. Its something to do with totally idle network.

But the contempt is something which actually gives a 3rd dimension to the AB search, it has a potential to be used for giving more search depth to certain positions and less to others. Currently this focus is used on the very sensitive (ct, ct/2) gradient (half contempt for endgame). What happens here is that the search is more reluctant to enter the endgame territory, as it loses the contempt bonus of the eval. So we actually safeguard our middlegame plan, we protect against simplifying to a high-eval drawish endgame, when we can have other middlegame options. That is exactly where the self-elo contempt gains come compared to ct=0.

One night, I could not sleep at all, had crazy inspirational brainstorming for hours, producing these ideas:
https://groups.google.com/forum/#!topic/fishcooking/6ir922ZvM5I
Next day I was feeling very bad due to waking up in the afternoon, tired. I questioned my mental involvement to the SF project, and computer chess in general, considering quitting for other activities.
I hope something good comes out of it.

For the next month I will be exploring Iran, probably abstaining completely from internet. I will be awaiting pleasant updates when I return!

@MichaelB7
Copy link
Contributor

MichaelB7 commented Nov 13, 2018

I know my opinion is probable in the minority - but we spend way too much time on contempt. It s the vanity of vanities, Here today , gone tomorrow. Just about every single scoring change will impact the optimal score for contempt. It reminds me of the Book of Ecclesiastes.

@DragonMist
Copy link

Thank you @NKONSTANTAKIS for understanding, supporting and providing so much info in one place.

@Vizvezdenec
Copy link
Contributor Author

dynamic contempt removal wouldn't pass [-3;1], it was run on 1 mil games with -2.5 elo performance with error bar < 1 elo.
And about 10 passing vs 8... Sorry, but I believe it's just a fluke and nothing more.

@NKONSTANTAKIS
Copy link

NKONSTANTAKIS commented Nov 14, 2018

You can believe what you want, but 10 passed clear green vs 12 as well not just vs 8.
http://tests.stockfishchess.org/tests/view/5a968d710ebc590297cc8ceb
Also there was SPSA of the atan formula with the 3 parameters and 2 of them were changing but the base contempt parameter had start value 10 and end value 10.239
http://tests.stockfishchess.org/tests/view/5a99b3ab0ebc590297cc8ef2
Sure I agree that confidence was never too much outside of the error bars, but still every single result was positive and with good scores:
http://tests.stockfishchess.org/tests/view/5a9a6a810ebc590297cc8f86
http://tests.stockfishchess.org/tests/view/5a9ba68e0ebc590297cc906f
http://tests.stockfishchess.org/tests/view/5a98fb040ebc590297cc8e6e

Those gains were on top of other gains which were on top of other gains which were gaining +2 elo vs C0. So yea if you have chained positive tests on top of each other, confidence and flukes can't hold the castle, sorry.

For me its fairly certain that a contempt value around 10-11 beats both ct=0 and ct=24 head to head by 2-4 elo. Vs c0 I am sure, and since ct24 barely passed (-3,1) vs ct0 I don't see why not as well.

Anyone interested in this could check it, its just 1-2 tests. I think its better to be clear on how much elo we are sacrificing in self play for better results vs weaker engines (and the other goodies like decreased drawrate, more spectacular play etc) than to operate in the shadow for not attracting contempt opposition. Then we can select the contempt we like, but by knowing exactly what is going on. @mcostalba Has expressed a different viewpoint than @snicolet on this in the past, valuing self-play elo more. I think both worlds have their virtues.

@Vizvezdenec For me its not too important to use resources for this because I am sure. But since you question me it automatically becomes important for me because its like you throw a glove at me, and I accept the challenge. @DragonMist Requested it too. We don't need 20+ tests like you did vs SF7 etc nor 5 that DM asked, just 1-2.
I propose C11 vs C0 fixed game count since sprt is useless in this case, as we won't be doing anything with it. The elo estimation on the other hand will be useful.

@vdbergh
Copy link
Contributor

vdbergh commented Nov 14, 2018

@NKONSTANTAKIS

The value of contempt in self play (if any) will never be measured accurately. A fixed length test with a resolution of 1 elo needs roughly 170000 games. If you do multiple tests you need even more games (as the uncertainties accumulate).

People are unwilling to invest the proper amount of ressources to obtain scientifically valid conclusions. I understand this is not the aim of the SF project, but one should be open about it and not be trying to keep up the pretence.

Note that you have to allocate high enough ressources before a test since if the ressources are too low and you get a non-significant result there is no conclusion - neither negative nor positive - the ressources are simply wasted.

PS. I wrote "if any" above since I could not duplicate the supposed elo gain in private testing. It may of course be that there is something wrong with my personal setup.

@ssj100
Copy link

ssj100 commented Nov 14, 2018

I see it quite simply really. SF default contempt should be:

  1. The highest value that doesn't regress against contempt 0 based on -3,1 SPRT.
    The only pitfall is (other pitfalls are purely speculative and subjective, which isn't the philosophy of SF project I think?):
    Allows the (very) small possibility that "default contempt SF" is objectively weaker in head to head relative to "contempt 0 SF".
    However, even if objective weakness was "proven" in this situation (say with a 1 billion head to head game sample), it wouldn't be much weaker (well below 5 elo I'd guess).
    Furthermore, a "default contempt" of 12 or 24 (or somewhere around these values) gains much more than 5 elo against weaker engines, and this is objective - that is, there is objective elo gain relative to "contempt 0" of probably up to 30-50 elo (depending on how weak the engine is and how high the "default contempt" is).

You can rave on about statistics all you want (lies, damn lies, and statistics) and so can I (and do!). However, I'd suggest that ultimately, it's all about doing more "good" than "harm" (applies to perhaps everything in life?), and therefore, the perspective of 1. above seems reasonable to me.

@vdbergh
Copy link
Contributor

vdbergh commented Nov 14, 2018

The highest value that doesn't regress against contempt 0 based on -3,1 SPRT.

This already is a flaw. A SPRT(-3,1) that fails does not mean a regression. The probability of a false negative is about 36% which is much too high to allow for any conclusion.

@NKONSTANTAKIS
Copy link

NKONSTANTAKIS commented Nov 15, 2018

@vdbergh The proposition was not for accuracy definitely not for 170K games as you say, this would be a waste. Why would we need accuracy for something that we are not going to use? The reason for this is just to prove that a self-play elo gain exists for medium contempt. In this way people can use this for best possible play or analysis instead of using default contempt or 0 contempt. 40K for +-2 elo is more than enough, and it at -1 prio it will run on idle network. In fact even at 20K games I estimate that the elo gain will be bigger than the margins.

I also want to make clear than I don't oppose the rise of default contempt and that I find this made up rule of picking the highest one which passes (-3,1) vs 0 really good for picking a balance point. It is not important if the (-3,1) flukes in one direction or the other, its just a way for us to not overdo it and at the same time keep people calm about self-play performance not diverging too much from optimal.

@snicolet
Copy link
Member

Merged via 2a7213f, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants